|Year : 2009 | Volume
| Issue : 6 | Page : 260-265
FPGA-based Low Power Audio Subword Sorter Unit
P Karthigaikumar1, K Baskaran2
1 Department of Electronics and Communication Engineering, Karunya University, Coimbatore, India
2 Department of Computer Science and Engineering, Government College of Technology, Coimbatore, India
|Date of Web Publication||18-Jan-2010|
Department of Electronics and Communication Engineering, Karunya University, Coimbatore
| Abstract|| |
The security of audio data in high end communication applications like satellites and radars is an issue of concern these days. Designing a processor at the chip level for this requirement is by itself a challenge to VLSI engineers. This paper aims to design a HDL based novel audio subword sorter unit, which is less complex in structure and highly efficient in terms of security. In this paper, we examine the hardware implementation of powerful permutation instruction group (GRP) with low power. This is done at the integrated chip (IC-level) using Verilog HDL and can be implemented in FPGA. To our knowledge this is the first audio subword sorter unit implemented in FPGA.
Keywords: Audio subword sorter, Cryptography, Permutation, Multimedia network security
|How to cite this article:|
Karthigaikumar P, Baskaran K. FPGA-based Low Power Audio Subword Sorter Unit. IETE J Res 2009;55:260-5
| 1.Introduction|| |
In most cases, multimedia data are packed into subwords of one or two bytes that are processed in parallel in word oriented processors as per the single instruction multiple data (SIMD) , . This is called sub word parallelism. In order to fully exploit the sub word parallel operations, the sub words need to be efficiently rearranged inside the registers to enhance the permutation. Efficient handling of permutations  is also needed for the software implementation of cryptographic algorithms to achieve the needed throughput. The selection of efficient permutation instructions and design of fast permutation units have gained a lot of interest ,,, .
As per GRP, the data bits that are associated with the control bits equal to one are concentrated to the left side of the output. Similarly, the data bits associated with the control bits equal to zero are concentrated on the right side of the output. This action resembles a sorting operation for the control bits, where the largest bits that are equal to one are gathered to the left. Therefore the problem of designing a hardware unit that executes GRP is equivalent to the design of a sorting network that sorts the control bits and moves the data bits appropriately. Recent work has until designing a permutation unit for audio data sorting which is given in  .
Since current microprocessors are word-oriented, performing bit-level permutation is very painful. Every bit has to be extracted from the source register, moved to its new location in the destination register, and combined with the bits that have already been moved. This requires four instructions per bit (mask generation, AND, SHIFT, OR), and 4n instructions to perform an arbitrary permutation of n bits. With certain microprocessors like PA-RISC, more powerful bit-manipulation instructions such as the EXTRACT and DEPOSIT instructions exist. They can essentially perform the four operations required for each bit in two instructions (EXTRACT, DEPOSIT), resulting in 2n instructions for any arbitrary permutation of n bits. Pre-defined permutations with some regular patterns can of course be done in fewer instructions, for example, the permutations in DES  . But, in general, an arbitrary 64-bit permutation could take 128 or 256 instructions on current microprocessors.
For sub word permutation instructions, MAX-2 has a general-purpose PERMUTE instruction which can do any permutation, with and without repetitions, of the sub words packed in a register. However, it is only defined for 16-bit sub words. IA-64 also has the MUX instruction, which is a fully general permute instruction for 16-bit sub words, with five new permute byte variants. Altivec has the VPERM instruction which extends the general permutation capabilities of MAX-2's PERMUTE instruction to eight-bit sub words selected from two 128-bit source registers into a single 128-bit destination register. Since there are 32 such sub words from which 16 are selected, this requires 16*l g 32 = 80 bits for specifying the desired permutation. This means that VPERM has to use another 128-bit register to hold the permutation control bits, making it a very expensive instruction with three source registers and one destination register, all 128 bits wide. None of the sub word permutation instructions defined so far can perform arbitrary bit-level permutations efficiently.
| 2.Enhanced Merge Sorter Network|| |
As in enhanced bi-tonic sorting network EBSN  , the merge sorter network shown in [Figure 1] contains connected sub networks called enhanced merge sorters (EMS) whose purpose is analogous to the EBS unit. Also, only n/2 MSBs are sufficient to guide the data rearrangement. The n/2 most significant control bits that are equal to one suggest that no exchange be done at these locations because the data in those locations should be fixed. If any of the bits of the n/2 MSB in the control word is zero, their corresponding n/2 LSB bits cannot be swapped. This is because the locations are of no relative significance and once swapped the corresponding bits in n/2 LSB become relatively insignificant. Hence it is not intended to bring them back to right side again. A signal called 'barrier' is introduced after the first level and it depicts these two constraints. The structure of the EMSN is shown in [Figure 1].
| 3.Proposed Modified Enhanced Merge Sorter Network|| |
In this paper, the audio data is a 16-bit signal coming into the system in real time. This data is divided into two halves since the processor is an eight-bit processor. The advantage of eight-bit processor over higher bit processor is given in  . Each half is fed to the EMS as the input and sorted data are sent to the transmission line. The main idea of enhancement is to incorporate some video code signals into the locations which are free and of no relative significance.
The main strength of EMS network  lies in the control word. If even one bit of the control word is changed, the whole process and network structure changes. The control word used in the paper has five '1's and three '0's. Hence, at the output of the left and right data path, there would be uneven number of relatively insignificant bits available. But in this paper, it is needed to have even number of locations free so that the video bits can be incorporated and reconfigurability , can be applied. Hence the control word is changed to have even number of 1's and 0's which in turn demands that the whole structure be changed. A new structure is thus designed for the new control word.
According to the method, the data bits are masked with the control bits to obtain the left data path. To obtain the right data path, the data bits are masked with the inverted control bits. But since the control word has been changed, the same structure cannot be applied to the left and right data paths. Hence a new structure is designed for each data path.
One modification has been made to design the structure. The second constraint in the 'barrier' is swapping of corresponding swap locations if the bits in the control word are '0'. This is done to take care that the relatively insignificant bits sent to the right side (LSB side) are not brought back. Enhancement, with the argument that instead of blocking the relatively insignificant bits in the beginning they be pushed to the right in subsequent levels of the design, has helped remove this constraint. Another modification is introduction of the 'barrier' signal in the first level itself. This would help so that the relatively significant bits are not moved.
The paper  expounds only the sorting technique at the transmitter side. It does not give any method to retrieve (decrypt) the data at the receiving end. New proposed structures have been designed and introduced so that the data can be retrieved. Separate structures have been designed for decryption of the left and right data paths.
| 4.Implementation Details of Left Data Path at the Transmitter End|| |
In the first level positions 1-5, 2-6, 3-7, 4-8 are checked. So A 5 A 4 and A 1 A 0 are interchanged since they have different control bits. A 'swap' signal is generated denoting the locations which have been swapped. In the second level, positions 3-6, 4-7, 5-8 are checked. Hence A 1 A 0 A 3 and A 2 A 5 A 4 are swapped. In the third level, positions 4-6, 5-7 are checked. Hence A 0 A 4 and A 1 A 5 are swapped. The structure for this implementation is shown in [Figure 2].
| 5.Implementation Details of Left Data Path at the Receiver End|| |
In the first level, positions 3-6, 4-7 are checked. So A 0 A 4 and A 2 A 1 are interchanged since they have different control bits. A 'swap' signal is generated denoting the locations which have been swapped. If the swap bits are different, the bit positions are relatively incorrect. In the second level, position 3-5 is checked. Hence A 0 and A 5 are swapped. In the third level, position 5-8 is checked. Hence A 0 and A 3 are swapped. The structure for this implementation is shown in [Figure 3].
| 6.Implementation Details of Right Data Path at the Transmitter End|| |
In the first level, positions 1-5, 2-6, 3-7, 4-8 are checked. So A 7 and A 3 are interchanged since they have different control bits. A 'swap' signal is generated denoting the locations which have been swapped. In the second level, position 2-5 is checked. Hence A 6 and A 7 are swapped. In the third level, positions 1-7, 2-8 are checked. Hence A 0 and A 7 are swapped. The structure for this implementation is shown in [Figure 4].
| 7.Implementation Details of Right Data Path at the Receiver End|| |
In the first level, positions 1-3, 2-4 are checked. So A 5 A 4 and A 3 A 0 are interchanged since they have different control bits. In the second level, position 1-8 is checked. Hence A 3 and A 7 are swapped. In the third level, position 5-8 is checked. Hence A 6 and A 3 are swapped. In the fourth level, position 2-8 is checked. Hence A 0 and A 6 are swapped. The structure for this implementation is shown in [Figure 5].
| 8.Result Analysis and Discussions|| |
The following modifications are made in ,
No paper on related to FPGA implementation of audio sorter has appeared so far. Hence we have written the code for  which deals only with encryption and its corresponding synthesis report converted into transistor level net list; it is implemented in Tanner EDA Tool and the power is calculated. Then our proposed algorithm which deals with both encryption and decryption code is taken and its corresponding synthesis report is converted into transistor level net list and implemented in Tanner EDA Tool. The power and synthesis results are compared with power consumption and synthesis reports of  . The result is shown in [Table 1] and power waveforms are shown in [Figure 6] and [Figure 7] respectively.
- The control word has even number of 1's and 0's. Hence video signals can be incorporated into the four data paths evenly along with audio signals.
- New structure has been designed for both the left and right data path separately. The designed structures are simpler and more efficient than the ones existing .
- Control word is the basic element of security in this design. Changing even one bit of the control word would alter the process completely. Hence the intruder cannot get the data easily.
- Decryption of the data paths at the receiver end has also been introduced. Separate structures have been designed for this. This has not been mentioned in .
The command to find the power in Tanner Tool EDA is
.model pmos pmos
.model nmos nmos
.tran 4n 400n
.print p(v /node number/)
| 9.Conclusion|| |
The paper proposes a novel scheme of "audio permutation sorter unit" for audio applications. It exploits the concept of reconfigurable computing which is a recent technique of VLSI system design. The audio sorting algorithms have been designed at the chip level by using hardware description language and thereby implement the design in a single chip. The low power techniques are introduced to optimize the design for utmost real time reliability. This paper is a real time application and can be incorporated in various fields like satellite data security, radar echo pulse data security, TV systems security and the like to provide image and audio security.
| 10.Acknowledgment|| |
The authors would like to thank the management of Karunya University for all the tools and support to carry out the research successfully. The authors would also like to thank the anonymous reviewers for their constructive comments which helped improve the clarity and presentation of the paper.
| Authors|| |
P. Karthigaikumar received his Bachelor of Engineering degree in Electrical and Electronics Engineering from the Bharathiar University, India in 1999 and his Master of Engineering degree with Distinction in Applied Electronics from Bharathiar University, India in 2003. He is pursuing Ph. D degree in Anna University-Coimbatore, India from 2007, focusing on Media Security processor. He is the member of International Association of Engineers (MIAENG) and member of International Association of Computer sciences and Information Technology (MIACSIT). He joined Karunya University, Coimbatore, India in 2000.He is now Assistant Professor in Electronics and Communication Engineering. His research interest includes FPGA implementation of Crypto algorithm, FPGA implementation of Watermarking algorithm and reconfigurable processor.
K. Baskaran received his Bachelor of Engineering degree in Electrical and Electronics Engineering from the Annamalai University, India in 1989, Master of Engineering degree in Computer Science Engineering from Bharathiar University, India in 2002 and Ph. D degree from Anna University-Chennai, India in 2006. He is a member of IEEE and member of ISTE. He is now Assistant Professor in Computer Science and Engineering, Government college of Technology, Coimbatore, India . His area of interest includes Adhoc networks, network security etc.
| References|| |
|1.||T. Conte, T.M. Dubey, P.K. Jennings, M.D. Lee, R.B. Releg, A. Rathnam, et al. "Challenges to combining general purpose and multimedia processors" IEEE computer, Vol. 30, No. 12, pp. 33-7, Dec. 1997. |
|2.||I. Kuroda, and T. Nishitani. "Multimedia processors", Proc. IEEE, Vol. 86, No. 6, pp. 1203-21, Jun. 1998. |
|3.||R.B. Lee, Z. Shi, and X. Yang. "Efficient permutation instructions for fast software cryptography" IEEE Micro, Vol. 21, No. 6, pp. 56-69, Nov/Dec. 2001. |
|4.||Z.J. Shi. "Bit permutaion instructions: Architeture, Implementation and cryptographic properties", Ph.D Dissertation, Electr. Engg. Dept. Princeton Univ., Princeton, NJ, 2004. |
|5.||Z.J. Shi, and R.B. Lee. "Implementation Complexity of bit permutation instructions", in Proc.Asilomar Conf. Signals Stst. Comput, 2003, pp. 879-86. |
|6.||X. Yang, and R.B. Lee. "Fast sub word permutation units using omega and flip network stages", in Proc. IEEE Int. Conf. Comput. Design, 2000, pp. 15-22. |
|7.||I. Kadayif, P. Nath, M. Kandemir, and A. Sivasubramaniam. "Reducing data TLB power via compiler directed Address Generation", IEEE transactions on Comp. Aided design of IC's and systems, Vol. 26, No. 2, pp. 312-24, Feb. 2007. |
|8.||N. Lashkarian, E. Hemphi, H. Tarn, H. Parekh, and C. Dick. "Reconfigurable Digital Front End Hardware for wire less base-station transmitters: Analysis, Design and FPGA implementation", IEEE transactions on circuits and systems, Vol. 54, No. 8, pp. 1666-77, Aug. 2007. |
|9.||G. Dimitrakopoulos, C. Mavrokefalidis, K. Galanopoulos, and D. Niolos, "Sorter based permutation units for Media-Enhanced Processors" IEEE Transactions on VLSI systems, Vol. 15, No. 6, pp. 711-5, Jun. 2007. |
|10.||P. Karthigaikumar, K. Basakaran, and P. Babu. "A Novel argument to use 8 bit processor for low power media application" International Conference on IMECS, Vol. 1, Mar. 2008, pp. 301-6. |
|11.||J.M. Jou, Y.L. Lee, C.Y. Lin, and C.M. Sun. "A Novel Reconfigurable computation unit for DSP applications", IEEE comp. s0 ociety annual symp. o0 n VLSI, ISVLSI'07, Mar. 2007, pp. 439-44. |
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]