|Year : 2009 | Volume
| Issue : 6 | Page : 282-286
High Speed Vedic Multiplier for Digital Signal Processors
Ramesh Pushpangadan, Vineeth Sukumaran, Rino Innocent, Dinesh Sasikumar, Vaisak Sundar
Department of ECE, College of Engineering, Munnar, PB NO 45 County Hills, Idukki, Kerala, India
|Date of Web Publication||18-Jan-2010|
Department of ECE, College of Engineering, Munnar, PB NO 45 County Hills, Idukki, Kerala
| Abstract|| |
Digital signal processors (DSPs) are very important in various engineering disciplines. Fast multiplication is very important in DSPs for convolution, Fourier transforms etc. A fast method for multiplication based on ancient Indian Vedic mathematics is proposed in this paper. Among the various methods of multiplications in Vedic mathematics, Urdhva tiryakbhyam is discussed in detail. Urdhva tiryakbhyam is a general multiplication formula applicable to all cases of multiplication. This algorithm is applied to digital arithmetic and multiplier architecture is formulated. This is a highly modular design in which smaller blocks can be used to build higher blocks. The coding is done in VHDL (very high speed integrated circuits hardware description language) and synthesis is done using Xilinx ISE series. The combinational delay obtained after synthesis is compared with the performance of the modified Booth Wallace multiplier which is a fast multiplier. This Vedic multiplier can bring about great improvement in DSP performance.
Keywords: Multiplier, Urdhva tiryakbhyam, Vedic mathematics
|How to cite this article:|
Pushpangadan R, Sukumaran V, Innocent R, Sasikumar D, Sundar V. High Speed Vedic Multiplier for Digital Signal Processors. IETE J Res 2009;55:282-6
| 1.Introduction|| |
High speed arithmetic operations are very important in many signal processing applications. Speed of the digital signal processor (DSP) is largely determined by the speed of its multipliers. In fact the multipliers are the most important part of all digital signal processors; they are very important in realizing many important functions such as fast Fourier transforms and convolutions. Since a processor spends considerable amount of time in performing multiplication, an improvement in multiplication speed can greatly improve system performance. Multiplication can be implemented using many algorithms such as array, booth, carry save, and Wallace tree algorithms.
The computational time required by the array multiplier is less because the partial products are computed independently in parallel. The delay associated with the array multiplier is the time taken by the signals to propagate through the gates that form the multiplication array  .
Arrangement of adders is another way of improving multiplication speed. There are two methods for this: Carry save array (CSA) method and Wallace tree method. In the CSA method, bits are processed one by one to supply a carry signal to an adder located at a one bit higher position. The CSA method has got its own limitations since the execution time depends on the number of bits of the multiplier. In the Wallace tree method, three bit signals are passed to a one bit full adder and the sum is supplied to the next stage full adder of the same bit and the carry output signal is passed to the next stage full adder of same number of bit and the then formed carry is supplied to the next stage of the full adder located at a one bit higher position. In this method, the circuit lay out is not easy  .
Booth algorithm reduces the number of partial products. However, large booth arrays are required for high speed multiplication and exponential operations which in turn require large partial sum and partial carry registers. Multiplication of two n-bit operands using a radix-4 booth recording multiplier requires approximately n/ (2m) clock cycles to generate the least significant half of the final product, where m is the number of booth recoded adder stages. Thus, a large propagation delay is associated with this case  . The modified booth encoded Wallace tree multiplier uses modified booth algorithm to reduce the partial products and also faster additions are performed using the Wallace tree.
This paper proposes a novel fast multiplier adopting the sutra of ancient Indian Vedic mathematics called Urdhva tiryakbhyam  . The design of the multiplier is faster than existing multipliers reported previously.
| 2.FPGA Architecture|| |
This section describes the Xilinx field programmable logic arrays based on the architecture of Virtex-II. All Xilinx FPGA contain the same basic resources - slices (grouped into configurable logic blocks), IOBs and programmable interconnect. The other resources include memory, multipliers, global clock buffers and boundary scan logic. The architecture of Virtex - II is shown in [Figure 1]. The slices contain combinational logic and register resources. Each Virtex-II CLB contains four slices. The structure of a single slice is shown in [Figure 2]. Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs. A switch matrix provides access to general routing resources. The major parts of a slice include two look-up Tables (LUTs), two sequential elements, and carry logic. The LUTs are known as the F LUT and the G LUT. The sequential elements can be programmed to be either registers or latches. The combinational logic is stored in the LUTs. The input path of the IOB element contains two DDR registers. The output path contains two DDR registers and two/three state enable DDR registers. There are separate clocks and clock enables for input and output where as the set and reset puns are shared  .
Implementation with FPGA has to follow certain steps as shown in the [Figure 3].
| 3.Urdhva Tiryakbhyam|| |
Urdhva tiryakbhyam is a multiplication sutra (formula) from Vedic mathematics  . Vedic mathematics is an ancient Indian system of mathematics. Vedic mathematics was rediscovered by Jagadguru Swami Sri Bharati Krishna Tirthaji Maharaja. He found the basis of the system written in the form of sutras in an appendix of Atharvaveda. The method is illustrated in [Figure 4].
| 4.The Multiplier Architecture|| |
The multiplier architecture is based on this Urdhva tiryakbhyam sutra. The advantage of this algorithm is that partial products and their sums are calculated in parallel. This parallelism makes the multiplier clock independent. The other main advantage of this multiplier as compared to other multipliers is its regularity. Due to this modular nature the lay out design will be easy. The architecture can be explained with two eight bit numbers i.e. the multiplier and multiplicand are eight bit numbers. The multiplicand and the multiplier are split into four bit blocks. The four bit blocks are again divided into two bit multiplier blocks. According to the algorithm the 8 x 8 (A x B) bit multiplication will be as follows.
A = AH - AL, B = BH - BL
A = A7A6A5A4A3A2A1A0
B = B7B6B5B4B3B2B1B0
AH = A7A6A5A4, AL = A3A2A1A0
BH = B7B6B5B4, BL = B3B2B1B0
[Additional file 1]
By the algorithm, the product can be obtained as follows.
Product of A x B = AL x BL + AH x BL + AL x BH + AH x BH
The parallel multiplications:-
[Additional file 2]
The 4 x 4 bit multiplication can be again reduced to 2 x 2 bit multiplications. The 4 bit multiplicand and the multiplier are divided into two-bit blocks.
AH = AHH - AHL
BH = BHH - BHL
AH x BH = AHL x BHL + AHH x BHL + AHL x BHH + AHH x BHH
Here the parallel multiplications are
[Additional file 3]
Thus 8 x 8 multiplications can be decomposed into 2 x 2 multiplication units. By using this algorithm any complex N x N multiplication can be implemented using the basic 2 x 2 multiplier units.
| 5.Verification and Implimentation|| |
In this work the algorithms are implemented in VHDL and logic simulations are done in Modelsim simulator and the synthesis is done using Xilinx - project navigator.
The families used for synthesis are
XILINX: VIRTEXE: XCV50E:-7
| 6.Results and Discussions|| |
The result is grouped in [Table 1] for different bit multiplications of the Vedic multiplier. [Table 1] and [Table 2] shows the difference in combinational delays between the Vedic multiplier and Booth Wallace multiplier for 8 x 8 and 16 x 16 bit multiplication. The highest performances for both multipliers are seen on the device Virtex2p with a speed grade of -7. The combinational delays for both the multipliers are same for 8 x 8 multiplications. But for the 16 x 16 multiplication the Vedic multiplier shows a very improved performance over the modified Booth Wallace multiplier. The results suggest that Vedic multiplier is an extreme fast multiplier and is well ahead of the modified Booth Wallace multiplier.
| 7.Conclusion|| |
The proposed Vedic multiplier proves to be highly efficient in terms of speed. Due to its regular and parallel structure it can be realized easily on silicon as well. The main advantage is delay increases slowly as input bits increase.
Ramesh Pushpangadan is an Assistant Professor in the Department of ECE, College of Engineering Munnar. He is currently pursuing his PhD from Indian Institute of Technology Mumbai. His areas of interests are VLSI design and renewable energy.
Vineeth Sukumaran received his B.Tech in Electronics and Communication Engineering from College of Engineering Munnar in 2009. He is currently an employee in State Bank of India. His areas of interests include VLSI design and digital communication theory.
Rino Innocent received his B.Tech in Electronics and Communication Engineering from College of Engineering Munnar in 2009. He is currently working as a communication engineer in a private firm. His areas of interests include digital systems design and power electronics.
Dinesh Sasikumar received his B.Tech in Electronics and Communication Engineering from College of Engineering Munnar in 2009. He is currently a Base Transceivers Station and Microwave commissioning engineer at Calliper Telecom services. His areas of interests include communication systems and signal processing.
Vaisak Sundar completed his B.Tech in Electronics and Communication Engineering from College of Engineering Munnar in 2009. He is currently a lecturer in Rajiv Gandhi Institute of Technology, Bengaluru. His areas of interests are embedded systems and microwave theory.
| References|| |
|1.||H.S. Dhillon, and A. Mitra. "A reduced-bit multiplication algorithm for digital arithmetic". International Journal of Computational and Mathematical Sciences, pp. 64-9, 2008. |
|2.||H. Thapliyal, and H.R. Arabnia. "A Time-Area-Power Efficient Multiplier and Square Architecture Based On Ancient Indian Vedic Mathematics", Proceedings of the 2004 International Conference on VLSI (VLSI'04), Las Vegas, Nevada, June 2004, pp. 434-9. |
|3.||M.C. Hanumantharaju, H. Jayalaxmi, R.K. Renuka, and M. Ravishankar. " A High Speed Block Convolution using Ancient Indian Vedic Mathematics". International Conference on Computational Intelligence and Multimedia. |
|4.||Xilinx university program. "Basic FPGA architecture" pp. 2-46. |
|5.||V.A. Pedroni. "Circuit design with VHDL" pp. 4. |
|6.||W.B. Vasantha Kandasami, and F. Smarandache. "Vedic Mathematics- 'Vedic ' or 'Mathematics': A fuzzy and neutrosophic analysis" pp. 19. |
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
[Table 1], [Table 2]
|This article has been cited by|
||Vedic mathematics based multiply accumulate unit
| || Jaina, D., Sethi, K., Panda, R. |
| ||Proceedings - 2011 International Conference on Computational Intelligence and Communication Systems, CICN 2011 ,. 2011; art (6112972): 754-757 |
||Performance evaluation and synthesis of multiplier used in FFT operation using conventional and vedic algorithms
| ||Thakre, L.P., Balpande, S., Akare, U., Lande, S. |
| ||Proceedings - 3rd International Conference on Emerging Trends in Engineering and Technology, ICETET 2010 ,. 2010; art (5698399, ): 614-619 |