|Year : 2011 | Volume
| Issue : 2 | Page : 149-155
A Low-power and High-performance Radix-4 Multiplier Design Using a Modified Pass-transistor Logic Technique
Faculty of Engineering and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450 Melaka, Malaysia
|Date of Web Publication||30-May-2011|
Faculty of Engineering and Technology, Multimedia University, Jalan Ayer Keroh Lama, 75450 Melaka
| Abstract|| |
This paper describes a 1-bit adder designed using the modified complementary pass transistor logic technique. The proposed adder was implemented in an 8 Χ 8 bit high radix multiplier circuit. This paper describes the proposed adder technique for obtaining high speed, lower area, less power dissipation and lower propagation delay. The multiplier circuits were schematized using the DSCH2 schematic design tool, and their layouts were generated with the Microwind 2 VLSI layout CAD tool. The parameter analyses were performed with a BSIM4 analyzer. Two unsigned multipliers were designed using the proposed modified complementary pass-transistor logic (CPL) adder cell, namely a Carry Save Array multiplier (CSA multiplier) and a Baugh-Wooley multiplier, for comparison with our proposed adder cell-based high radix multiplier. The proposed adder cell-based CSA multiplier and Baugh-Wooley multiplier, as well as other existing multipliers, were compared with the high radix multiplier circuit in terms of power dissipation, propagation delay, latency, throughput, Energy Per Instruction and area. Our proposed 1-bit adder and adder-based high radix multipliers demonstrated better performance than other published results.
Keywords: Complementary pass-transistor logic, High-radix multiplier, Radix-4 multiplier, VLSI CAD tool
|How to cite this article:|
Senthilpari C. A Low-power and High-performance Radix-4 Multiplier Design Using a Modified Pass-transistor Logic Technique. IETE J Res 2011;57:149-55
|How to cite this URL:|
Senthilpari C. A Low-power and High-performance Radix-4 Multiplier Design Using a Modified Pass-transistor Logic Technique. IETE J Res [serial online] 2011 [cited 2014 Mar 7];57:149-55. Available from: http://jr.ietejournals.org/text.asp?2011/57/2/149/81744
| 1. Introduction|| |
Multiplication is one of the most important functions in arithmetic operations. The prevalence of portable electronic devices, such as cellular phones, PDAs and digital cameras, creates the need for high efficiency. A high-speed multiplier is greatly desired, since multiplication utilizes most of the execution time in many Digital Signal Processor (DSP) devices  . There are three important issues to be considered in the Very Large Scale Integration (VLSI) design: the chip area, speed of computation and power dissipation  . Many multiplier circuit designs have been proposed, which manage to operate at lower propagation delays with lesser power dissipation and a lower power rating of input bits  . Considering speed, the Braun multiplier, Booth multiplier and high-radix multiplier are the fastest types of multipliers  . Parallel multipliers carry out high-speed operations; however, there is a tradeoff with the large circuit area and high power consumption. Therefore, one of the most important tasks in VLSI design is to reduce the power consumption and area size whilst retaining high performance.
Addition is an important component of arithmetic operations, like subtraction, multiplication and division  . The proposed multiplier circuits make use of an adder cell to accomplish the multiplication algorithm, and two types of adder cells have been implemented in the multiplier circuits: namely, the half adder cell and full adder cell. Basic multiplication can be realized by the shift-add algorithm by generating partial products and adding successive properly shifted partial products. Thus, multiplication is proportional to the number of partial products to be added  . High-radix multiplication algorithms can reduce the number of partial products by handling more than 1 bit of the multiplier in each cycle; so, fewer cycles are required as it moves to higher radices. Furthermore, the reduction of the number of cycles, along with the incorporation of recoding and carry-save addition to simplify the required computations in each cycle, allow for a significant improvement in the speed of high-radix multipliers  .
This paper proposes an adder cell that combines the multiplexing control input technique (MCIT) and complementary pass-transistor logic (CPL) techniques. The adder cells are implemented into an 8 × 8 bit high-radix multiplier. The simulation results for 8 × 8 bit high-radix multipliers are based on the Complementry Metal Oxide Semiconductor (CMOS) design rule for low-power and high-speed application. The proposed adder-based radix-4 multiplier and three different multiplier designs, namely, the Baugh-Wooley multiplier, CSA multiplier, high-radix multiplier and multipliers proposed by other authors, are compared in terms of speed, area and power dissipation. The proposed modified CPL adder cell-based multiplier is compared with other existing multiplier circuits in terms of power dissipation, propagation delay, and total chip area. Our adder-based multiplier demonstrates better performance than the other multiplier circuits for all parameters.
| 2. Adder Architecture Using The CPL Technique|| |
The CPL technique eliminates the occurrence of P-type Metal Oxide Semiconductor (PMOS) latch, and techniques capable of overcoming the pass transistor logic threshold voltage loss problem do so by adding an inverter at the output. The logic style of CPL results in a smaller number of transistors and smaller input loads, especially when N-type Metal Oxide Semiconductor (NMOS) networks are used  . However, the CPL circuit has some drawbacks due to body effects, source follower action, and high power leakage. When it is not cross-coupled, it will cause low performance at large stage counts and limited fan-out capability  . According to Markovic et al.,  the duality principle of the proposed CPL adder circuit topology, with inverted gate signals, gives the dual logic function. Dual logic functions include AND-OR, NAND-NOR and XOR-XNOR. Referring to the basic structure of pass transistor logic style, by simply modifying the input nodes, AND, OR, NAND and NOR logic gates can be constructed  . By changing the input nodes at the source terminal, XOR and NXOR logic gates can be constructed.
The proposed full adder cell is designed with the CPL technique and the multiplexing control input technique (MCIT) for both sum and carry operations. The sum operation is designed based on Equation (1), where two XOR logic gates are used, since pass-transistor logic is advantageous in constructing XOR logic gates. Meanwhile, the carry circuit is designed according to Equation (2). By combining the sum and carry circuits, the XOR gate in the carry operation can be omitted, and both circuits can share the common term, A⊕B, in the sum operation.
The inputs A, A's complement (A'), B, and B's complement (B') are fed as inputs to the pass transistors and form an XOR logic gate. These four inputs construct an XOR logic operation at the transistor level, which is designed using two transistors. In order to reduce the number of transistors, the output of the XOR gate (A⊕B) is fed through an NOT gate from the differential node to the pass transistors as a control input. On the other hand, Cin is treated as variable input, which is fed through the pass transistor source terminal. At this stage, the functionality of the circuit is equivalent to the sum operation, sum A⊕B⊕C, and six transistors have been used. As mentioned before, the number of transistors in the carry operation can be reduced by taking A⊕B as the input from the sum operation circuit AND with Cin in order to produce the operation equivalent to (A⊕B)Cin , which only uses another two transistors. Meanwhile, the inputs A, A', B, and B' are fed into pass transistors in order to produce an AND logic gate, which represents the AB operation in Equation (2). The outputs of both (A⊕B)Cin and AB are used as multiplexing inputs in order to sum both terms with the OR gate operation. The transistor count can be reduced by modifying the OR gate at the last stage of the carry equation. This is done by removing the inverter and the transistor fed by the inverter. Markovic's  full adder circuit has 22 transistors. At an earlier point, 3 transistors were omitted in our design and the number of transistors of the full adder cell was reduced to 17 transistors, which is lower than the number of transistors in the circuit described by Markovic  , which is 22. [Figure 1] shows the proposed full adder circuit using 17 transistors after applying the redundant transistor reduction technique. The basic architectures of the Baugh-Wooley multiplier and 4 × 4 bit basic CSA multiplier were constructed based on the architectures given by Yeo et al.  . The full adder blocks presented were placed with our proposed full adder cell, and all the logic gates in both multiplier architectures were designed based on the CPL technique in order to compare their performance under identical conditions.
| 3. Architecture of Proposed Radix-4 Multiplier|| |
The architecture of our proposed radix-4 multiplier circuits comprises partial product selectors, partial product pre-computation blocks, and half adder and full adder block, which is shown in [Figure 2]. In the radix-4 circuits, 2 bits per cycle will be considered. Therefore, 4 multiples, 0a, 1a, 2a and 3a, are pre-computed, where "a" is the multiplicand. This is done by the partial product pre-computed blocks, where 2a is simply the shifted version of "a", and 3a = 2a + 1a  . The pre-computation circuit for 3a consists of half adder and full adder blocks configured using the ripple carry adder (RCA) architecture. The half adder circuit is designed based on the CPL technique, and the full adder blocks are used with our proposed full adder circuit. Partial product selectors are formed by OR and AND gates, which are used to determine the partial products. By connecting all the pre-computation blocks and partial product selectors, a 4-to-1 multiplexer can be realized, as shown in [Figure 3]. The multiplexer is functioned such that the first 2 bits of the multiplier, x, will be grabbed to determine the first partial product and shifted to the next 2 bits of the multiplier to determine the successive partial products by repeating the same process. For a 4-bit radix-4 multiplier, two partial products will be generated. As a result, half of the number of partial products has been reduced compared to the normal 1-bit shift-add algorithm.
Before adding the partial product, all pre-computed partial products are ORed with each other, since
At the end, all partial products with proper shifts are connected to RCAs to compute the final output product of the radix-4 multiplier. To compare the performance of the radix-4 multipliers under the same condition, a Baugh-Wooley multiplier and carry save array multiplier were constructed using the CPL logic and the same half adder and full adder blocks used by the radix-4 multipliers. The multiplicand, "a", and multiplier, "x", are two inputs that are calculated in parallel by the multiplier circuit. A 4-bit binary number can be interpreted as a 2-digit radix-4 number, and radix-4 multiplication can be represented as 
where p = product, a = multiplicand and x = multiplier. Based on the multiplication recurrences above, a more practical example of radix multiplication is shown in [Figure 4]  . Without considering whether the 3a multiple will be needed during the multiplication, the 3a multiple is always computed at the outset and stored in a register for future use  .
| 4. Results and Discussion|| |
The radix-4 multiplier design was simulated using standard CMOS 0.35 μm, CMOS 0.25 μm, CMOS 0.18 μm and CMOS 0.12 μm feature sizes. The layout generated tool (Microwind 3) incorporated an estimated wire delay, and for this reason, the actual chip incorporation into might show some small variations on the timing presented in this section. Based on the highly modular design, the critical path defined the minimal clock period that could be applied to the pipeline method of radix multiplication. Other components in the design, such as the one responsible for the final reduction, can be pipelined to match the critical path delay. The comparison of the simulation results of our proposed 1-bit full adder cell with the proposed 1-bit adder cells by Chang et al.  and Massimo et al.  , in terms of power dissipation, propagation delay, power delay product (PDP) and area, is presented in [Table 1]. The proposed 1-bit full adder shows remarkable improvement in power dissipation, propagation delay and power delay product. Improvements around 50-90% in terms of power dissipation and 8-98% in terms of propagation delay are achieved compared to the full adders proposed by the other authors. Furthermore, the PDP is reduced by 60-99% compared to the other authors' proposed full adder. The only drawback is that the proposed full adders occupy a much larger area compared to the other authors' full adders. Our proposed 1-bit adder cell was implemented in 8 × 8 bit radix-4, Baugh-Wooley and CSA multipliers, and simulated results are shown in [Table 2]. The simulation was done for various feature sizes, namely, CMOS 0.35 μm, CMOS 0.25 μm, CMOS 180 nm and CMOS 120 nm and corresponding supply voltages of 3.5, 2.5, 2 and 1.2 V, respectively. [Table 1] shows the simulation results for power dissipation, propagation delay, PDP and area of the radix-4, Baugh-Wooley and CSA multipliers. Since the CSA representation of intermediate results was used, we observe that the word size does not affect the critical path. The results in [Table 3] show the dominant role of radix-4 multipliers. Compared with the multiplier circuits we designed for Baugh-Wooley and CSA multipliers, the radix-4 multiplier circuit has a better performance.
|Table 1: 1-Bit adder cell comparison in terms of power dissipation, propagation delay, PDP and area|
Click here to view
|Table 2: 8 × 8 Bit radix-4, Baugh-Wooley and CSA multipliers: Power dissipation, propagation delay, throughput, latency, EPI and PDP|
Click here to view
Our proposed adder-based radix-4 multiplier circuits are compared with other existing results, as shown in [Table 3]. The multiplier results are compared with those of Coasta et al.  . The radix-4 multiplier circuit gives better performance in terms of power and propagation delay. The high-radix multiplier circuit reduces power consumption by 99.73% and propagation delay by 99.95% compared with the design of Coasta et al.  . The radix-4 multiplier circuit uses the adder cell, which was designed using the CPL design technique. This technique has decreased the transistor count tremendously. For this reason, the radix-4 multiplier circuits require lower power and have lower delays. Our circuit shows decreased power consumption and propagation delay compared with the circuit designed by Oscal et al.  , due to the lower number of transistors used in the full adder cell and the easier path of propagation, which minimizes the propagation delay.
The various parameters analyzed using a BSIM4 analyzer, such as supply voltage versus power dissipation, supply voltage versus leakage current, capacitance versus power dissipation, and capacitance versus leakage current, are plotted using the BSIM4 analyzer. The load capacitance is varied with the power dissipation and leakage current, which is plotted against the total load capacitance. [Figure 5] shows the variation of the capacitance with the power dissipation for three types of multipliers. Our proposed radix-4 multiplier proved to have the lowest power dissipation compared with the other multipliers. The capacitance versus leakage current is shown in [Figure 6]. Meanwhile, our proposed radix-4 multiplier also has the lowest leakage current among these three types of multipliers, as shown in [Figure 7]. Since the radix-4 multiplier has lower power consumption and a low leakage current, it is suitable for low-power and high-performance applications. [Figure 8] indicates the supply voltage versus power dissipation for the radix-4 multiplier, CSA multiplier and Baugh-Wooley multiplier. Our proposed circuit gives lower dissipation than the CSA and Baugh-Wooley multiplier circuits for the corresponding supply voltages.
|Figure 5: Capacitance versus power dissipation for a radix-4 multiplier, CSA multiplier and Baugh-Wooley multiplier.|
Click here to view
|Figure 6: Capacitance versus leakage current for a radix-4 multiplier, CSA multiplier and Baugh-Wooley multiplier.|
Click here to view
|Figure 7: Supply voltage versus leakage current for a radix-4 multiplier, CSA multiplier and Baugh-Wooley multiplier.|
Click here to view
|Figure 8: Supply voltage versus power dissipation for a radix-4 multiplier, CSA multiplier and Baugh-Wooley|
Click here to view
| 5. Conclusion|| |
The proposed 1-bit adder circuit was designed using a modified CPL technique. The proposed 1-bit adder cell was implemented in radix-4, Baugh-Wooley and CSA multipliers. Our proposed adder-based radix-4 multiplier may be used in DSP applications because it gives better performance than the Baugh-Wooley multiplier, CSA multiplier and other authors' multipliers in terms of most of the aspects, such as power dissipation, leakage current, propagation delay and PDP. Our proposed adder-based multiplier circuit may be used in high-speed application circuits due to its lower dissipated power and less propagation delay.
| References|| |
|1.||K Yeo and K Roy "Low-voltage, low power VLSI sub system" Mc Graw-Hill publishcation, 1998. |
|2.||E Costa, S Bambi, and José Monteiro "A New Architecture for Signed Radix-2m Pure Array Multipliers" Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD.02). |
|3.||B Park, M Shin, I C Park, and C M Kyung, "Radix-4 multiplier with regular layout structure," Electronics Letter, vol. 34, no. 15, pp. 1446-7, 1998. |
|4.||B Parhami, "Computer Arithmetic algorithms and Hardware Designs," Oxford University Press 2000. ISBN 0-19-512583-5. |
|5.||Y K Yamanaka, T Nishida, T Saito, M Shimohigashi, and K Shimizu, A. Hitachi Ltd., Tokyo "A 3.8-ns CMOS 16×16-b multiplier using complementary pass-transistor logic," IEEE Journal of Solid-State Circuits, vol.25, no 2, pp. 388-95, 1990. |
|6.||M Psilogeorgopoulos, M Munteanu, T S Chuang, P A Ivey, and L Seed "Contemporary Techniques for Lower Power Circuit Design," PREST Deliverable D2.1 version 0.1, 2000. |
|7.||R Zimmermann and W Fichtner, Fellow, IEEELow-Power "Logic Styles: CMOS Versus Pass-Transistor Logic," IEEE Journal Of Solid-State Circuits, vol. 32, no. 7, pp. 1079-90, 1997. |
|8.||D Markovic, B Nikolic, and V G Oklobdzija, "A general method in synthesis of pass-transistor circuits," Microelectr. J, vol. 31, pp. 991-8, 2000. |
|9.||C H Chang, J Gu, and M Zhang, "A review of 0.18-mm full adder performances for tree structured arithmetic circuits," IEEE Trans. Very Large Scale Integr. (VLSI) Syst. vol. 13 pp. 686-95, 2005. |
|10.||Massimo Alioto, Member, IEEE, and Gaetano Palumbo, Senior Member, IEEE, Analysis and Comparison on Full Adder Block in Submicron Technology, IEEE Transactions On Very Large Scale Integration (VLSI) Systems, vol. 10, no. 6, pp. 806-23, 2002. |
|11.||L Sousa and R Chaves, "A universal architecture for designing efficient modulo 2n+1 multipliers," IEEE Trans. Circuits Syst.-I: Regular Papers, vol. 52, pp. 1166-78, 2005. |
|12.||T Oscal, C Chen, S Wang, and Y W Wu, "Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers," IEEE Transaction on Very Large Scale Integration (VLSI) Systems, vol. 11, no. 3, pp. 418-33, 2008. |
|13.||J D Lee, Y J Yoony, K H Leez, and B G Park, "Application of dynamic pass-transistor logic to an 8-bit multiplier", J Kor Phys Soc, vol. 38, pp. 220-23, 2001. |
|14.||R Mudassir and Z Abid, "New parallel multipliers based on low power adders," 2005 IEEE CCECE/CCGEI, Saskatoon, pp. 694-7, 2005. |
|15.||M C Wen, S J Wang, and Y N Lin, "Low-power parallel multiplier with column bypassing," IEE Electr Lett, vol. 41, pp. 1-2, 2001. |
| Authors|| |
C. Senthilpari received the M.Sc (Applied electronics) and M.E (Material science) from National Institutes of Technology, Trichirappalli (India). He obtained a PhD Degree from Multimedia University (Malaysia). His current research interests are in the area of VLSI Design, high speed interconnect modelling and design issues in high performance IC's.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8]
[Table 1], [Table 2], [Table 3]