Translator Disclaimer
2 November 1999 High-performance FFT implementation on the BOPS ManArray parallel DSP
Author Affiliations +
We present a high performance implementation of the FFT algorithm on the BOPS ManArray parallel DSP processor. The ManArray we consider for this application consists of an array controller and 2 to 4 fully interconnected processing elements. To expose the parallelism inherent to an FFT algorithm we use a factorization of the DFT matrix in Kronecker products, permutation and diagonal matrices. Our implementation utilizes the multiple levels of parallelism that are available on the ManArray. We use the special multiply complex instruction, that calculates the product of two complex 32-bit fixed point numbers in 2 cycles (pipelinable). Instruction level parallelism is exploited via the indirect Very Long Instruction Word (iVLIW). With an iVLIW, in the same cycle a complex number is read from memory, another complex number is written to memory, a complex multiplication starts and another finishes, two complex additions or subtractions are done and a complex number is exchanged with another processing element. Multiple local FFTs are executed in Single Instruction Multiple Data (SIMD) mode, and to avoid a costly data transposition we execute distributed FFTs in Synchronous Multiple Instructions Multiple Data (SMIMD) mode.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nikos P. Pitsianis and Gerald Pechanek "High-performance FFT implementation on the BOPS ManArray parallel DSP", Proc. SPIE 3807, Advanced Signal Processing Algorithms, Architectures, and Implementations IX, (2 November 1999); doi: 10.1117/12.367633;


Super-fast Fourier transform
Proceedings of SPIE (February 15 2006)
Bit-Serial Architectures For Parallel Arrays
Proceedings of SPIE (July 27 1986)

Back to Top